Stupid Art Project 2017 - Random Holy Text

To fit the stupid art theme "Religion" and my current focus on machine learning, I'm going to generate some random

The code below was picked up from this site: http://agiliq.com/blog/2009/06/generating-pseudo-random-text-with-markov-chains-u/ It'll be my first attempt at random holy text generation.


In [1]:
import random


class Markov(object):
    def __init__(self, open_file):
        self.cache = {}
        self.open_file = open_file
        self.words = self.file_to_words()
        self.word_size = len(self.words)
        self.database()

    def file_to_words(self):
        self.open_file.seek(0)
        data = self.open_file.read()
        words = data.split()
        return words

    def triples(self):
        """ Generates triples from the given data string. So if our string were
                "What a lovely day", we'd generate (What, a, lovely) and then
                (a, lovely, day).
        """

        if len(self.words) < 3:
            return

        for i in range(len(self.words) - 2):
            yield (self.words[i], self.words[i + 1], self.words[i + 2])

    def database(self):
        for w1, w2, w3 in self.triples():
            key = (w1, w2)
            if key in self.cache:
                self.cache[key].append(w3)
            else:
                self.cache[key] = [w3]

    def generate_markov_text(self, size=25):
        seed = random.randint(0, self.word_size - 3)
        seed_word, next_word = self.words[seed], self.words[seed + 1]
        w1, w2 = seed_word, next_word
        gen_words = []
        for i in range(size):
            gen_words.append(w1)
            w1, w2 = w2, random.choice(self.cache[(w1, w2)])
        gen_words.append(w2)

        return ' '.join(gen_words)

I'll use the old testament first. I got the text from project Gutenberg, and have removed the Gutenberg pre and postamble. I suspect the text will be too big to run efficiently in the code above, as it is just straight python. We shall see.


In [3]:
with open('sources/bible.txt') as bible:
    bible_gen = Markov(bible)

In [4]:
bible_gen.generate_markov_text()


Out[4]:
"from the dead. 6:17 For Herod feared John, knowing that she die: because thou hadst been here, my brother Absalom's house. 13:21 But if any speak,"

In [5]:
%%timeit
bible_gen.generate_markov_text()


10000 loops, best of 3: 39.5 µs per loop

It works, and it is fast. But the punctuation and the passage numbering are not the best. I'll try post processing the responses.

Lets first treat the passage numbers appearing mid passage, things like '7:12'.


In [6]:
import re

'7:12' appearing at the beginning of a passage is desirable actually, so I'd liek to keep those, and possibly force those later. It is the other passage numbers appearing in the middle or end of passages that we want to avoid. Those patterns are guaranteed to have a space in front of them, ie ' 7:12', so I can regex to find and replace using that pattern.


In [7]:
passage_num_pattern = re.compile(r' \d+:\d+')

In [8]:
for _ in range(3):
    passage = bible_gen.generate_markov_text()
    print(passage)
    passage = passage_num_pattern.sub('', passage)
    print(passage)


and, lo, I perceived that it be tried unto the end, the same did God send to meet them, and embraced him, and went and them,
and, lo, I perceived that it be tried unto the end, the same did God send to meet them, and embraced him, and went and them,
to cease, till the buriers have buried him. And Jesus went with Elisha from Gilgal. 2:2 And it came to pass, while my breath is to
to cease, till the buriers have buried him. And Jesus went with Elisha from Gilgal. And it came to pass, while my breath is to
up dust upon their corpses: 3:4 Because of thy servants: for I will meet with the life of a fool according to her as silver, precious
up dust upon their corpses: Because of thy servants: for I will meet with the life of a fool according to her as silver, precious

But I want to create random passages, so they need to start with a passage number, and contain a few "complete" sentences, ie they end in a period. I think I need to alter the generating method, and I think the alterations will be particular to the old testament. So I'll extend the Markov class with a generating method specialy for this text of the old testament.


In [11]:
class OldTestaPassagesMarkov(Markov):
    passage_num_pattern = re.compile(r'\d+:\d+')
    
    def generate_markov_text(self, seed_word='', min_words=25):
        
        # Process a user given seed_word
        seed_word_locations = [idx for idx, word in enumerate(self.words) if word.lower() == seed_word.lower()]
            
        if seed_word_locations:
            seed = random.choice(seed_word_locations)
        else:
            print(seed_word + ' is not in Old Testament')
            seed = random.randint(0, self.word_size - 3)
            
        w1, w2 = self.words[seed], self.words[seed + 1]
        gen_words = [w1, w2]
        # go until we have enough words and end in a period
        while w2[-1] != '.' or len(gen_words) < min_words:
            w1, w2 = w2, random.choice(self.cache[(w1, w2)])
            if passage_num_pattern.findall(w2):
                # Avoid adding passage numbers to the middle of the passage.
                # Also end a sentence when a passage number would have gone in.
                new_w1 = w1.replace(':', '.').replace(';', '.')
                gen_words[-1] = new_w1
            else:                
                gen_words.append(w2)

        return ' '.join(gen_words)

In [12]:
with open('sources/bible.txt') as bible:
    bible_gen = OldTestaPassagesMarkov(bible)

In [13]:
bible_gen.generate_markov_text('12:13', min_words=10)


Out[13]:
'12:13 The wicked have inclosed me: they have not delivered by much strength.'

In [14]:
len('So king Solomon passed all the cities that are entering to go into his own language.')


Out[14]:
84

Randomnly generate the passage chapter and verse numbers.

The chapter:verse (4:34) pairs are good places to start the passages, as they are always starts of sentences, so I'll always seed the generation process with one of those pairs. But, some chapter and verse combos do not exist in the text, so I can't just randomly generate the pairs. Though, I can just sample directly from the set of pairs from the words list.


In [15]:
passage_numbers = set()

for word in bible_gen.words:
    found_pattern = bible_gen.passage_num_pattern.findall(word)
    if found_pattern:
        passage_numbers.add(found_pattern[0])
list(passage_numbers)[:10]


Out[15]:
['31:16',
 '34:18',
 '91:12',
 '99:5',
 '30:10',
 '22:63',
 '5:23',
 '13:17',
 '50:28',
 '12:55']

Twitter bot experiment

Following this website, I setup a twitter account (HolyStupidArt) and a twitter app attached to that account. http://www.dototot.com/how-to-write-a-twitter-bot-with-python-and-tweepy/

I've also setup my api accesses, and have sent my first message, the one about king Soloman above. Now, I'm going to experiment with the twitter interface to see if I can make the both respond to direct messages.


In [16]:
import tweepy

from twitter_secrets import api_tokens as at


# enter the corresponding information from your Twitter application:
auth = tweepy.OAuthHandler(at['CONSUMER_KEY'], at['CONSUMER_SECRET'])
auth.set_access_token(at['ACCESS_KEY'], at['ACCESS_SECRET'])
api = tweepy.API(auth)

In [18]:
user = api.followers_ids()

In [19]:
user


Out[19]:
[2176913060, 159166590]

In [21]:
dm = api.sent_direct_messages()

In [22]:
import time

In [23]:
followers = set(api.followers_ids())

while True:
    time.sleep(15)
    current_followers = set(api.followers_ids())
    if current_followers != followers:
        for user_id in current_followers - followers:
            api.send_direct_message(user_id=user_id, text='Hi new follower')
            print('DM sent to new follower')
        followers = current_followers
    break


---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-23-d2fda8568df4> in <module>()
      2 
      3 while True:
----> 4     time.sleep(15)
      5     current_followers = set(api.followers_ids())
      6     if current_followers != followers:

KeyboardInterrupt: 

In [24]:
import threading
import time
from sys import stdout

# Only data wihtin a class are actually shared by the threads
class Counter(object):
    counter = 0
    stop = False

# Function that will be executed in parallel with the rest of the code
def function_1():
    for i in range(10):
        if c.stop: return # With more will exit faster
        time.sleep(2)
        c.counter += 1
        if c.stop: return

# Create a class instance
c = Counter()

# Thread function_1
d = threading.Thread(target=function_1)
d.start()

# Exit the thread properly before exiting...
try:
    for j in range(100):
        stdout.write('\r{:}'.format(c.counter))
        stdout.flush()
        time.sleep(1)
except:
    c.stop = True


7

Not sure how to make this work for my example right off. I'll check another solution first.

This one is from http://stackoverflow.com/questions/31116213/streaming-twitter-direct-messages It works great. The direct messages are going in and out on_data, not on_direct_message; not sure what that is about.


In [25]:
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy import API

from tweepy.streaming import StreamListener


class StdOutListener( StreamListener ):

    def __init__( self ):
        self.tweetCount = 0

    def on_connect( self ):
        print("Connection established!!")

    def on_disconnect( self, notice ):
        print("Connection lost!! : ", notice)

    def on_direct_message( self, status ):
        print("Entered on_direct_message()")
        try:
            print(status, flush = True)
            return True
        except BaseException as e:
            print("Failed on_direct_message()", str(e))

    def on_error( self, status ):
        print(status)

def main():

    try:
        auth = tweepy.OAuthHandler(at['CONSUMER_KEY'], at['CONSUMER_SECRET'])
        auth.secure = True
        auth.set_access_token(at['ACCESS_KEY'], at['ACCESS_SECRET'])

        api = API(auth)

        # If the authentication was successful, you should
        # see the name of the account print out
        print(api.me().name)

        stream = Stream(auth, StdOutListener())

        stream.userstream()

    except BaseException as e:
        print("Error in main()", e)

In [26]:
main()


HolyStupidArtBot
Connection established!!
Error in main() 

In [27]:
import json
from pprint import pprint

In [28]:
s = '{"direct_message":{"id":797525813570568196,"id_str":"797525813570568196","text":"test","sender":{"id":797228167974895616,"id_str":"797228167974895616","name":"HolyStupidArtBot","screen_name":"HolyStupidArt","location":null,"url":null,"description":null,"protected":false,"followers_count":1,"friends_count":1,"listed_count":0,"created_at":"Sat Nov 12 00:03:01 +0000 2016","favourites_count":0,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":1,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"F5F8FA","profile_background_image_url":null,"profile_background_image_url_https":null,"profile_background_tile":false,"profile_image_url":"http:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_1_normal.png","profile_image_url_https":"https:\/\/abs.twimg.com\/sticky\/default_profile_images\/default_profile_1_normal.png","profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":true,"following":false,"follow_request_sent":false,"notifications":false,"translator_type":"none"},"sender_id":797228167974895616,"sender_id_str":"797228167974895616","sender_screen_name":"HolyStupidArt","recipient":{"id":159166590,"id_str":"159166590","name":"Andrew Brown","screen_name":"salvor7","location":"vancouver","url":null,"description":"Today, who knows","protected":false,"followers_count":111,"friends_count":170,"listed_count":1,"created_at":"Thu Jun 24 17:02:23 +0000 2010","favourites_count":19,"utc_offset":-21600,"time_zone":"Central Time (US & Canada)","geo_enabled":true,"verified":false,"statuses_count":215,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/731256308909772800\/uHAlRwB-_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/731256308909772800\/uHAlRwB-_normal.jpg","profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":false,"follow_request_sent":false,"notifications":false,"translator_type":"none"},"recipient_id":159166590,"recipient_id_str":"159166590","recipient_screen_name":"salvor7","created_at":"Sat Nov 12 19:45:45 +0000 2016","entities":{"hashtags":[],"symbols":[],"user_mentions":[],"urls":[]}}}'

In [29]:
json_obj = json.loads(s)

In [30]:
pprint(json_obj)


{'direct_message': {'created_at': 'Sat Nov 12 19:45:45 +0000 2016',
                    'entities': {'hashtags': [],
                                 'symbols': [],
                                 'urls': [],
                                 'user_mentions': []},
                    'id': 797525813570568196,
                    'id_str': '797525813570568196',
                    'recipient': {'contributors_enabled': False,
                                  'created_at': 'Thu Jun 24 17:02:23 +0000 '
                                                '2010',
                                  'default_profile': True,
                                  'default_profile_image': False,
                                  'description': 'Today, who knows',
                                  'favourites_count': 19,
                                  'follow_request_sent': False,
                                  'followers_count': 111,
                                  'following': False,
                                  'friends_count': 170,
                                  'geo_enabled': True,
                                  'id': 159166590,
                                  'id_str': '159166590',
                                  'is_translation_enabled': False,
                                  'is_translator': False,
                                  'lang': 'en',
                                  'listed_count': 1,
                                  'location': 'vancouver',
                                  'name': 'Andrew Brown',
                                  'notifications': False,
                                  'profile_background_color': 'C0DEED',
                                  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
                                  'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
                                  'profile_background_tile': False,
                                  'profile_image_url': 'http://pbs.twimg.com/profile_images/731256308909772800/uHAlRwB-_normal.jpg',
                                  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/731256308909772800/uHAlRwB-_normal.jpg',
                                  'profile_link_color': '1DA1F2',
                                  'profile_sidebar_border_color': 'C0DEED',
                                  'profile_sidebar_fill_color': 'DDEEF6',
                                  'profile_text_color': '333333',
                                  'profile_use_background_image': True,
                                  'protected': False,
                                  'screen_name': 'salvor7',
                                  'statuses_count': 215,
                                  'time_zone': 'Central Time (US & Canada)',
                                  'translator_type': 'none',
                                  'url': None,
                                  'utc_offset': -21600,
                                  'verified': False},
                    'recipient_id': 159166590,
                    'recipient_id_str': '159166590',
                    'recipient_screen_name': 'salvor7',
                    'sender': {'contributors_enabled': False,
                               'created_at': 'Sat Nov 12 00:03:01 +0000 2016',
                               'default_profile': True,
                               'default_profile_image': True,
                               'description': None,
                               'favourites_count': 0,
                               'follow_request_sent': False,
                               'followers_count': 1,
                               'following': False,
                               'friends_count': 1,
                               'geo_enabled': False,
                               'id': 797228167974895616,
                               'id_str': '797228167974895616',
                               'is_translation_enabled': False,
                               'is_translator': False,
                               'lang': 'en',
                               'listed_count': 0,
                               'location': None,
                               'name': 'HolyStupidArtBot',
                               'notifications': False,
                               'profile_background_color': 'F5F8FA',
                               'profile_background_image_url': None,
                               'profile_background_image_url_https': None,
                               'profile_background_tile': False,
                               'profile_image_url': 'http://abs.twimg.com/sticky/default_profile_images/default_profile_1_normal.png',
                               'profile_image_url_https': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_1_normal.png',
                               'profile_link_color': '1DA1F2',
                               'profile_sidebar_border_color': 'C0DEED',
                               'profile_sidebar_fill_color': 'DDEEF6',
                               'profile_text_color': '333333',
                               'profile_use_background_image': True,
                               'protected': False,
                               'screen_name': 'HolyStupidArt',
                               'statuses_count': 1,
                               'time_zone': None,
                               'translator_type': 'none',
                               'url': None,
                               'utc_offset': None,
                               'verified': False},
                    'sender_id': 797228167974895616,
                    'sender_id_str': '797228167974895616',
                    'sender_screen_name': 'HolyStupidArt',
                    'text': 'test'}}

In [31]:
json_obj['direct_message']['sender']['name'], json_obj['direct_message']['sender']['id'], json_obj['direct_message']['text']


Out[31]:
('HolyStupidArtBot', 797228167974895616, 'test')

In [32]:
json_obj.keys()


Out[32]:
dict_keys(['direct_message'])

In [33]:
from tweepy.models import Status

In [34]:
status = Status.parse(api=api, json=json_obj)

In [35]:
dm = status._json['direct_message']

In [36]:
time.localtime()[3:6]


Out[36]:
(17, 59, 0)

In [37]:
import datetime as dt
dt.datetime.now()


Out[37]:
datetime.datetime(2016, 11, 12, 17, 59, 0, 214826)

In [ ]: